79 research outputs found
Wavefront Marching Methods: A Unified Algorithm to Solve Eikonal and Static Hamilton-Jacobi Equations
© 2020 IEEE. This version of the article has been accepted for publication, after peer review. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The Version of Record is available online at: https://doi.org/10.1109/TPAMI.2020.2993500[Abstract]: This paper presents a unified propagation method for dealing with both the classic Eikonal equation, where the motion direction does not affect the propagation, and the more general static Hamilton-Jacobi equations, where it does. While classic Fast Marching Method (FMM) techniques achieve the solution to the Eikonal equation with a O(M log M) (or O(M) assuming some modifications), solving the more general static Hamilton-Jacobi equation requires a higher complexity. The proposed framework maintains the O(M log M) complexity for both problems, while achieving higher accuracy than available state-of-the-art. The key idea behind the proposed method is the creation of ‘mini wave-fronts’, where the solution is interpolated to minimize the discretization error. Experimental results show how our algorithm can outperform the state-of-the-art both in precision and computational cost.The authors would like to thank to the financial support of
the Spanish Ministerio de Economıa y Competitividad
(research project TIN2015-65069-C2-1-R), the Xunta de Galicia (research projects ED431C 2018/34 and Centro Singular
de Investigacion de Galicia, accreditation 2016-2019) and by
the European Union (European Regional Development
Fund). Brais Cancela acknowledges the support of the
Xunta de Galicia under its postdoctoral program.Xunta de Galicia; ED431C 2018/3
A scalable saliency-based Feature selection method with instance level information
Classic feature selection techniques remove those features that are either
irrelevant or redundant, achieving a subset of relevant features that help to
provide a better knowledge extraction. This allows the creation of compact
models that are easier to interpret. Most of these techniques work over the
whole dataset, but they are unable to provide the user with successful
information when only instance information is needed. In short, given any
example, classic feature selection algorithms do not give any information about
which the most relevant information is, regarding this sample. This work aims
to overcome this handicap by developing a novel feature selection method,
called Saliency-based Feature Selection (SFS), based in deep-learning saliency
techniques. Our experimental results will prove that this algorithm can be
successfully used not only in Neural Networks, but also under any given
architecture trained by using Gradient Descent techniques
Distributed Correlation-Based Feature Selection in Spark
CFS (Correlation-Based Feature Selection) is an FS algorithm that has been
successfully applied to classification problems in many domains. We describe
Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and
distributed version of the CFS algorithm, capable of dealing with the large
volumes of data typical of big data applications. Two versions of the algorithm
were implemented and compared using the Apache Spark cluster computing model,
currently gaining popularity due to its much faster processing times than
Hadoop's MapReduce model. We tested our algorithms on four publicly available
datasets, each consisting of a large number of instances and two also
consisting of a large number of features. The results show that our algorithms
were superior in terms of both time-efficiency and scalability. In leveraging a
computer cluster, they were able to handle larger datasets than the
non-distributed WEKA version while maintaining the quality of the results,
i.e., exactly the same features were returned by our algorithms when compared
to the original algorithm available in WEKA.Comment: 25 pages, 5 figure
On developing an automatic threshold applied to feature selection ensembles
© 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article "R.-J. Palma-Mendoza, L. de-Marcos, D. Rodriguez, y A. Alonso-Betanzos, «B. Seijo-Pardo, V. Bolón-Canedo, y A. Alonso-Betanzos, «On developing an automatic threshold applied to feature selection ensembles», Information Fusion, vol. 45, pp. 227-245, ene. 2019" has been accepted for publication in Information Fusion. The Version of Record is available online at https://doi.org/10.1016/j.inffus.2018.02.007[Abstract]: Feature selection ensemble methods are a recent approach aiming at adding diversity in sets of selected features, improving performance and obtaining more robust and stable results. However, using an ensemble introduces the need for an aggregation step to combine all the output methods that confirm the ensemble. Besides, when trying to improve computational efficiency, ranking methods that order all initial features are preferred, and so an additional thresholding step is also mandatory. In this work two different ensemble designs based on ranking methods are described. The main difference between them is the order in which the combination and thresholding steps are performed. In addition, a new automatic threshold based on the combination of three data complexity measures is proposed and compared with traditional thresholding approaches based on retaining a fixed percentage of features. The behavior of these methods was tested, according to the SVM classification accuracy, with satisfactory results, for three different scenarios: synthetic datasets and two types of real datasets (where sample size is much higher than feature size, and where feature size is much higher than sample size).This research has been financially supported in part by the Spanish Ministerio de Economa y Competitividad (research project TIN 2015-65069-C2-1-R), by the Xunta de Galicia (research projects GRC2014/035 and the Centro Singular de Investigación de Galicia, accreditation 2016–2019) and by the European Union (FEDER/ERDF).Xunta de Galicia; GRC2014/03
Low-Precision Feature Selection on Microarray Data: An Information Theoretic Approach
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] The number of interconnected devices, such as personal wearables, cars, and smart-homes, surrounding us every day has recently increased. The Internet of Things devices monitor many processes, and have the capacity of using machine learning models for pattern recognition, and even making decisions, with the added advantage of diminishing network congestion by allowing computations near to the data sources. The main restriction is the low computation capacity of these devices. Thus, machine learning algorithms capable of maintaining accuracy while using mechanisms that exploit certain characteristics, such as low-precision versions, are needed. In this paper, low-precision mutual information-based feature selection algorithms are employed over DNA microarray datasets, showing that 16-bit and some times even 8-bit representations of these algorithms can be used without significant variations in the final classification results achieved.This work has been supported by the grant Machine Learning on the Edge - Ayudas Fundación BBVA a Equipos de Investigación Científica 2019. It has also been possible thanks to the support received by the National Plan for Scientific and Technical Research and Innovation of the Spanish Government (Grant PID2019-109238GB-C2), and by the Xunta de Galicia (Grant ED431C 2018/34) with the European Union ERDF funds. CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidades from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014-2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). Open Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureXunta de Galicia; ED431C 2018/34Xunta de Galicia; ED431G 2019/0
Metodología de trabajo y experiencias de aprendizaje colaborativo y evaluación continua en la disciplina de Sistemas Multiagente
En este trabajo se exponen las experiencias realizadas para la adaptación al Espacio Europeo de Educación Superior de la asignatura Sistemas Expertos, de la titulación de Ingeniería Informática en la Universidad de A Coruña. El nuevo planteamiento se centra principalmente en la realización de actividades colaborativas, la incorporación de recursos virtuales y el sistema de evaluación continua empleado, que son posibles, en gran parte, debido a que el número de alumnos matriculados en la asignatura (una media de 21 en los dos últimos cursos), es adecuado para este tipo de experiencias. Con este planteamiento, el 95% de los alumnos matriculados (100% de los presentados) superaron la materia y demostraron un alto nivel de asimilación de los conceptos
How Important Is Data Quality? Best Classifiers vs Best Features
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] The task of choosing the appropriate classifier for a given scenario is not an easy-to-solve question. First, there is an increasingly high number of algorithms available belonging to different families. And also there is a lack of methodologies that can help on recommending in advance a given family of algorithms for a certain type of datasets. Besides, most of these classification algorithms exhibit a degradation in the performance when faced with datasets containing irrelevant and/or redundant features. In this work we analyze the impact of feature selection in classification over several synthetic and real datasets. The experimental results obtained show that the significance of selecting a classifier decreases after applying an appropriate preprocessing step and, not only this alleviates the choice, but it also improves the results in almost all the datasets tested.This work has been supported by the National Plan for Scientific and Technical Research and Innovation of the Spanish Government (Grant PID2019-109238 GB-C2), and by the Xunta de Galicia (Grant ED431C 2018/34) with the European Union ERDF funds. CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidades from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). Funding for open access charge: Universidade da Coruña/CISUGXunta de Galicia; ED431C 2018/34Xunta de Galicia; ED431G 2019/0
Una aproximación al Espacio Europeo de Educación Superior basada en el desarrollo de proyectos software en Ingeniería del Conocimiento
En esta ponencia se presenta una propuesta para la docencia en la asignatura de Ingeniería del Conocimiento de la Ingeniería en Informática. Esta propuesta supone un esfuerzo de cara a la adaptación de dicha asignatura al Espacio Europeo de Educación Superior, para la que uno de los principales problemas suele ser el elevado número de alumnos en las aulas. En este artículo se expone cómo hemos gestionado este problema para poder llevar la adaptación de la asignatura, utilizando el aprendizaje orientado a proyectos, y las ventajas e inconvenientes encontrados. Además, el sistema utilizado, con el que en general hemos obtenido resultados positivos, puede ser fácilmente extrapolable a otras asignaturas presentes en los planes de estudio de las Ingenierías Informáticas, como aquellas relacionadas con la Ingeniería del Software
On the Effectiveness of Convolutional Autoencoders on Image-Based Personalized Recommender Systems
[Abstract]
Over the years, the success of recommender systems has become remarkable. Due to the massive arrival of options that a consumer can have at his/her reach, a collaborative environment was generated, where users from all over the world seek and share their opinions based on all types of products. Specifically, millions of images tagged with users’ tastes are available on the web. Therefore, the application of deep learning techniques to solve these types of tasks has become a key issue, and there is a growing interest in the use of images to solve them, particularly through feature extraction. This work explores the potential of using only images as sources of information for modeling users’ tastes and proposes a method to provide gastronomic recommendations based on them. To achieve this, we focus on the pre-processing and encoding of the images, proposing the use of a pre-trained convolutional autoencoder as feature extractor. We compare our method with the standard approach of using convolutional neural networks and study the effect of applying transfer learning, reflecting how it is better to use only the specific knowledge of the target domain in this case, even if fewer examples are available.This research has been financially supported in part by European Union FEDER funds, by the Spanish Ministerio de Economía y Competitividad (research project PID2019-109238GB), by the Consellería de Industria of the Xunta de Galicia (research project GRC2014/035), and by the Principado de Asturias Regional Government (research project IDI-2018-000176). CITIC as a Research Center of the Galician University System is financed by the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) through the ERDF (80%), Operational Programme ERDF Galicia 2014–2020 and the remaining 20% by the Secretaria Xeral de Universidades (ref. ED431G 2019/01).Xunta de Galicia; GRC2014/035Gobierno del Principado de Asturias; IDI-2018-000176Xunta de Galicia; ED431G 2019/0
Regression Tree Based Explanation for Anomaly Detection Algorithm
[Abstract]
This work presents EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), a novel approach to address explanation using an anomaly detection algorithm, ADMNC, which provides accurate detections on mixed numerical and categorical input spaces. Our improved algorithm leverages the formulation of the ADMNC model to offer pre-hoc explainability based on CART (Classification and Regression Trees). The explanation is presented as a segmentation of the input data into homogeneous groups that can be described with a few variables, offering supervisors novel information for justifications. To prove scalability and interpretability, we list experimental results on real-world large datasets focusing on network intrusion detection domain.This research was partially funded by European Union ERDF funds, Ministerio de Ciencia e Innovación
grant number PID2019-109238GB-C22, Xunta de Galicia through the accreditation of Centro Singular de
Investigación 2016-2020, Ref. ED431G/01 and Grupos de Referencia Competitiva, Ref. GRC2014/035Xunta de Galicia; ED431G/01Xunta de Galicia; GRC2014/03
- …